2022-05-10

Data set overview

<<<<<<< HEAD

Title: “Peripheral Blood Mitochondrial DNA Copy Number Is Associated with Prostate Cancer Risk and Tumor Burden”

Authors: Weimin Zhou, Min Zhu, Ming Gui, Lihua Huang, Zhi Long, Li Wang, Hui Chen, Yinghao Yin, Xianzhen Jiang, Yingbo Dai, Yuxin Tang, Leye He, Kuangbiao Zhong

Goal: Determine if mtDNA from peripheral blood leukocytes is a predictor for prostate cancer

Flowchart for project flow

Data set overview

=======

Title: “Peripheral Blood Mitochondrial DNA Copy Number Is Associated with Prostate Cancer Risk and Tumor Burden”
Authors: Zhou W. et. al. (2014)
Purpose: Predict cancer from biomarkers, mainly mtDNA

>>>>>>> 5c8bf81ce53f9327444732244aabd2e7a483664f

Loading

  • Dimensions: 392, 13

  • Control and cancer case groups are proportional

Cleaning

  • Check for duplicates

  • Filter for PCR success (pcr_success)

  • New dimensions: 387, 13

Augmenting

  • BMI- and DFI-classifier

  • New columns based on TNM-notation

  • Add ‘group’ as strings

  • New dimensions: 387, 18

Flowchart for project flow

Boxplot with continuous variables, any outliers?

<<<<<<< HEAD

Boxplot with discrete variables, any outliers?

======= <<<<<<< HEAD

Boxplot with discrete variables, any outliers?

======= <<<<<<< HEAD

Boxplot with discrete variables, any outliers?

======= <<<<<<< HEAD

Boxplot with discrete variables, any outliers?

=======

Boxplot with discrete variables, any outliers?

>>>>>>> c82739fe0dbeabe4c0688fb8aac163cc643b0ee1 >>>>>>> 88874d76f4015e195047cc58f5c287b04ecb03ac >>>>>>> f9bed0730c00860f2fa64521553188e771a0c578 >>>>>>> 5c8bf81ce53f9327444732244aabd2e7a483664f

Re-creating plot from the article

<<<<<<< HEAD Article visualizationArticle visualization =======   >>>>>>> 5c8bf81ce53f9327444732244aabd2e7a483664f

A better biomarker for prostate cancer?

<<<<<<< HEAD

======= <<<<<<< HEAD

======= <<<<<<< HEAD

======= <<<<<<< HEAD

=======

>>>>>>> c82739fe0dbeabe4c0688fb8aac163cc643b0ee1 >>>>>>> 88874d76f4015e195047cc58f5c287b04ecb03ac >>>>>>> f9bed0730c00860f2fa64521553188e771a0c578 >>>>>>> 5c8bf81ce53f9327444732244aabd2e7a483664f

Logistic regression, excl. PSA

Significant p-values:
Maybe the distribution of DFI-classes are skewed?

Logistic regression, incl. PSA

Significant p-values:

Principal component analysis (PCA)

PCAPCAPCA

PCA

Some more data exploration

Interesting finding during exploratory data analysis

Conclusion

  • We can support the conclusion of the article, mtDNA is a biomarker for prostate cancer (e.g, it is reproducible)
  • PSA levels seem to be an even better biomarker
  • Both of the above could be supported by logistic regression
  • Conclusion for PCA?
  • Did we find anything interesting for EDA?